physical examination
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models
Chiu, Christopher, Pitis, Silviu, van der Schaar, Mihaela
Clinical reasoning in medicine is a hypothesis-driven process where physicians refine diagnoses from limited information through targeted history, physical examination, and diagnostic investigations. In contrast, current medical benchmarks for large language models (LLMs) primarily assess knowledge recall through single-turn questions, where complete clinical information is provided upfront. To address this gap, we introduce VivaBench, a multi-turn benchmark that evaluates sequential clinical reasoning in LLM agents. Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training, requiring agents to actively probe for relevant findings, select appropriate investigations, and synthesize information across multiple steps to reach a diagnosis. While current LLMs demonstrate competence in diagnosing conditions from well-described clinical presentations, their performance degrades significantly when required to navigate iterative diagnostic reasoning under uncertainty in our evaluation. Our analysis identified several failure modes that mirror common cognitive errors in clinical practice, including: (1) fixation on initial hypotheses, (2) inappropriate investigation ordering, (3) premature diagnostic closure, and (4) failing to screen for critical conditions. These patterns reveal fundamental limitations in how current LLMs reason and make decisions under uncertainty. Through VivaBench, we provide a standardized benchmark for evaluating conversational medical AI systems for real-world clinical decision support. Beyond medical applications, we contribute to the larger corpus of research on agentic AI by demonstrating how sequential reasoning trajectories can diverge in complex decision-making environments.
- North America > Canada > Ontario > Toronto (0.14)
- Oceania > Australia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
An Explainable AI Model for Predicting the Recurrence of Differentiated Thyroid Cancer
Ahmad, Mohammad Al-Sayed, Haddad, Jude
Thyroid carcinoma, a significant yet often controllable cancer, has seen a rise in cases, largely due to advancements in diagnostic methods. Differentiated thyroid cancer (DTC), which includes papillary and follicular varieties, is typically associated with a positive prognosis in academic circles. Nevertheless, there are still some individuals who may experience a recurrence. This study employs machine learning, particularly deep learning models, to predict the recurrence of DTC, with the goal of improving patient care through personalized treatment approaches. By analysing a dataset containing clinicopathological features of patients, the model achieved remarkable accuracy rates of 98% during training and 96% during testing. To improve the model's interpretability, we used techniques like LIME and Morris Sensitivity Analysis. These methods gave us valuable insights into how the model makes decisions. The results suggest that combining deep learning models with interpretability techniques can be extremely useful in quickly identifying the recurrence of thyroid cancer in patients. This can help in making informed therapeutic choices and customizing treatment approaches for individual patients.
- Asia > Middle East > Jordan (0.05)
- North America > United States (0.04)
- Research Report > Experimental Study (0.47)
- Research Report > New Finding (0.34)
- Health & Medicine > Therapeutic Area > Oncology > Thyroid Cancer (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Would YOU let a robot check your breasts for lumps? Ultra-sensitive robotic 'finger' could be used to diagnose cancer earlier
An ultra-sensitive robotic'finger' that could help detect breast cancer is being developed by scientists. Experts have created a device with a sophisticated sense of touch that can take patient pulses and check for abnormal lumps. The technology could make it easier for doctors to detect diseases such as breast cancer early on, when they are more treatable. And it may also help patients feel at ease during physical examinations that can seem uncomfortable and invasive, the researchers said. While rigid robotic fingers already exist, experts have raised concerns that these devices might not be up to the delicate tasks required in a doctor's office setting.
Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models
Liu, Jin, Li, Qingquan, Du, Wenlong
In current benchmarks for evaluating large language models (LLMs), there are issues such as evaluation content restriction, untimely updates, and lack of optimization guidance. In this paper, we propose a new paradigm for the measurement of LLMs: Benchmarking-Evaluation-Assessment. Our paradigm shifts the "location" of LLM evaluation from the "examination room" to the "hospital". Through conducting a "physical examination" on LLMs, it utilizes specific task-solving as the evaluation content, performs deep attribution of existing problems within LLMs, and provides recommendation for optimization.
- Health & Medicine > Diagnostic Medicine (0.52)
- Health & Medicine > Therapeutic Area (0.47)
A Breakthrough in Dementia Care: AI Can Diagnose Dementia As Accurately as Experts
A study finds that artificial intelligence for dementia diagnosis is as accurate as medical professionals with expertise in treating neurologic illnesses. More individuals are surviving into old age globally thanks to improvements in public health over the last several decades. Dementia, notably Alzheimer's disease, and other conditions that are often linked to aging are as a result seeing a major rise. This might impede the ability to provide prompt treatment to individuals in need, especially in light of a predicted physician shortage in the next decades. According to a recent study by researchers at the Boston University School of Medicine (BUSM), computational techniques (artificial intelligence/AI) may be able to help alleviate some of the challenges associated with delivering dementia care to an aging population.
Firms partner to revolutionise telehealth in pharmacies across Africa
As part of efforts towards making medications affordable and accessible in Africa, TytoCare has announced its partnership with mPharma, a technology-driven healthcare company building Africa's largest health management organisation. A statement issued by the companies on Wednesday, indicates that the partnership involves the integration of the TytoCare solution into mPharma's telehealth offerings which enables pharmacies to provide patients with enhanced remote care through in-depth, physical examinations. Both companies said the partnership will improve health care services to patients in Africa. It said the partnership was rolled out in June 2021 and that since then, over 8,000 people have been examined and treated by mPharma using TytoCare's platform. This spans around 35 pharmacies across Ghana, Kenya, Uganda, Zambia, and Nigeria, the statement said, adding that the partnership will provide solace to patients on the continent due to lack of adequate health care facilities in some countries in the region. Majority of the Primary Health Centres (PHCs) in Nigeria are either abandoned or providing very limited services due to inadequate manpower.
The opportunity for AI in Healthcare
Over the past decades, Artificial Intelligence (AI), has played a robust and growing role in the world. What many people don't realize is that AI presents itself in several different forms that impact everyday life. Logging into your email, social media, car ride services, and online shopping platforms all involve AI algorithms to ensure a better user experience. The medical field is one key area where AI is experiencing rapid growth; specifically, in managing treatment and diagnostics. There is significant research undertaken into how AI can help aid in clinical decisions, increase the efficiency of treatment, and support human judgment.
TytoCare Brings Telehealth to Ukraine Through GIVA Care Partnership
The launch will give physicians and consumers throughout the country access to TytoCare's leading, AI-powered remote examination solution, replicating in-person visits from the comfort of home during the pandemic and beyond TytoCare, the global healthcare industry's first all-in-one modular device and examination platform for AI-powered, on-demand, remote medical exams, today announced the launch of its telehealth solution in Ukraine. The launch is taking place via an exclusive partnership with GIVA Care Group, a leading healthcare distribution company, who will introduce TytoCare to the Ukrainian healthcare industry. This marks the first time such an advanced telehealth device and platform will be available to Ukrainian physicians and consumers. The COVID-19 pandemic has heavily impacted Ukraine, with over 37,000 fatalities and over 400,000 people currently infected with the virus. As a result, the Ukrainian healthcare ecosystem is seeking digital health solutions that will help physicians provide the best possible care both for coronavirus patients as well as the general population seeking ongoing care and looking to avoid hospitals and clinics.
Are the History and Physical Coming to an End?
As far back as the 1970s, doctors have pondered whether one day, as medical technology barrels ahead, the patient history and physical examination (H&P) would eventually become obsolete. And yet, we were all told in medical school that a proper history is enough to make X percent of diagnoses, which increases further when you work in physical findings. But today we are on the brink of the era of multiomics, a term encompassing the numerous data available for patients, from genomics, epigenomics, proteomics, microbiomics, metabolomics, and an array of other omics. These days, a health dataset from a single patient can be immense, to be sure. Advances in artificial intelligence and machine learning, however, are making it possible to organize and filter multiomic data from a patient in ways that make them useful to physicians--ways that can personalize diagnosis and care, and bypass the often imperfect recollections of patients and patients' families obtained during a history.
- North America > United States > New York (0.05)
- North America > United States > Florida > Hillsborough County > Tampa (0.05)
- Europe > United Kingdom > Wales (0.05)
- Europe > United Kingdom > Northern Ireland (0.05)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.31)
Generalization Theory and Deep Nets, An introduction
Deep learning holds many mysteries for theory, as we have discussed on this blog. Lately many ML theorists have become interested in the generalization mystery: why do trained deep nets perform well on previously unseen data, even though they have way more free parameters than the number of datapoints (the classic "overfitting" regime)? Zhang et al.'s paper Understanding Deep Learning requires Rethinking Generalization played some role in bringing attention to this challenge. Their main experimental finding is that if you take a classic convnet architecture, say Alexnet, and train it on images with random labels, then you can still achieve very high accuracy on the training data. Needless to say, the trained net is subsequently unable to predict the (random) labels of still-unseen images, which means it doesn't generalize.